AITopics | robust fine-tuning

Collaborating Authors

robust fine-tuning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RAMP: Boosting Adversarial Robustness Against Multiple l p Perturbations for Universal Robustness

Neural Information Processing SystemsFeb-12-2026, 18:58:16 GMT

A T models' multiple-norm robustness (union accuracy) is still low, which is crucial since in the real-world an adversary is not necessarily bounded by a single

artificial intelligence, machine learning, robustness, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Champaign County > Urbana (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Geodesic Multi-Modal Mixup for Robust Fine-Tuning

Neural Information Processing SystemsDec-26-2025, 11:52:59 GMT

Pre-trained multi-modal models, such as CLIP, provide transferable embeddings and show promising results in diverse applications. However, the analysis of learned multi-modal embeddings is relatively unexplored, and the embedding transferability can be improved. In this work, we observe that CLIP holds separated embedding subspaces for two different modalities, and then we investigate it through the lens of \textit{uniformity-alignment} to measure the quality of learned representation. Both theoretically and empirically, we show that CLIP retains poor uniformity and alignment even after fine-tuning. Such a lack of alignment and uniformity might restrict the transferability and robustness of embeddings. To this end, we devise a new fine-tuning method for robust representation equipping better alignment and uniformity. First, we propose a \textit{Geodesic Multi-Modal Mixup} that mixes the embeddings of image and text to generate hard negative samples on the hypersphere.

geodesic multi-modal mixup, name change, robust fine-tuning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.36)

Add feedback

Fast Trainable Projection for Robust Fine-tuning

Neural Information Processing SystemsDec-24-2025, 06:12:56 GMT

Robust fine-tuning aims to achieve competitive in-distribution (ID) performance while maintaining the out-of-distribution (OOD) robustness of a pre-trained model when transferring it to a downstream task. Recently, projected gradient descent has been successfully used in robust fine-tuning by constraining the deviation from the initialization of the fine-tuned model explicitly through projection. However, algorithmically, two limitations prevent this method from being adopted more widely, scalability and efficiency. In this paper, we propose a new projection-based fine-tuning algorithm, Fast Trainable Projection (FTP) for computationally efficient learning of per-layer projection constraints, resulting in an average 35% speedup on our benchmarks compared to prior works. FTP can be combined with existing optimizers such as AdamW, and be used in a plug-and-play fashion. Finally, we show that FTP is a special instance of hyper-optimizers that tune the hyper-parameters of optimizers in a learnable manner through nested differentiation. Empirically, we show superior robustness on OOD datasets, including domain shifts and natural corruptions, across four different vision tasks with five different pre-trained models. Additionally, we demonstrate that FTP is broadly applicable and beneficial to other learning scenarios such as low-label and continual learning settings thanks to its easy adaptability. The code will be available at https://github.com/GT-RIPL/FTP.git.

fast trainable projection, name change, robust fine-tuning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

RAMP: Boosting Adversarial Robustness Against Multiple l p Perturbations for Universal Robustness

Neural Information Processing SystemsOct-10-2025, 01:50:47 GMT

A T models' multiple-norm robustness (union accuracy) is still low, which is crucial since in the real-world an adversary is not necessarily bounded by a single

accuracy, robustness, union accuracy, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Champaign County > Urbana (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Feed Two Birds with One Scone: Exploiting Function-Space Regularization for Both OOD Robustness and ID Fine-Tuning Performance

Yuan, Xiang, Shu, Jun, meng, Deyu, Xu, Zongben

arXiv.org Artificial IntelligenceSep-9-2025

Robust fine-tuning aims to achieve competitive in-distribution (ID) performance while maintaining the out-of-distribution (OOD) robustness of a pre-trained model when transferring it to a downstream task. To remedy this, most robust fine-tuning methods aim to preserve the pretrained weights, features, or logits. However, we find that these methods cannot always improve OOD robustness for different model architectures. This is due to the OOD robustness requiring the model function to produce stable prediction for input information of downstream tasks, while existing methods might serve as a poor proxy for the optimization in the function space. Based on this finding, we propose a novel regularization that constrains the distance of fine-tuning and pre-trained model in the function space with the simulated OOD samples, aiming to preserve the OOD robustness of the pre-trained model. Besides, to further enhance the OOD robustness capability of the fine-tuning model, we introduce an additional consistency regularization to promote stable predictions of perturbed samples. Extensive experiments demonstrate our approach could consistently improve both downstream task ID fine-tuning performance and OOD robustness across a variety of CLIP backbones, outperforming existing regularization-based robust fine-tuning methods.

artificial intelligence, machine learning, robustness, (17 more...)

arXiv.org Artificial Intelligence

2509.05328

Country: Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy

Heuillet, Maxime, Bhagwatkar, Rishika, Ngnawé, Jonas, Pequignot, Yann, Larouche, Alexandre, Gagné, Christian, Rish, Irina, Ahmad, Ola, Durand, Audrey

arXiv.org Artificial IntelligenceAug-21-2025

Deep learning models operating in the image domain are vulnerable to small input perturbations. For years, robustness to such perturbations was pursued by training models from scratch (i.e., with random initializations) using specialized loss objectives. Recently, robust fine-tuning has emerged as a more efficient alternative: instead of training from scratch, pretrained models are adapted to maximize predictive performance and robustness. To conduct robust fine-tuning, practitioners design an optimization strategy that includes the model update protocol (e.g., full or partial) and the specialized loss objective. Additional design choices include the architecture type and size, and the pretrained representation. These design choices affect robust generalization, which is the model's ability to maintain performance when exposed to new and unseen perturbations at test time. Understanding how these design choices influence generalization remains an open question with significant practical implications. In response, we present an empirical study spanning 6 datasets, 40 pretrained architectures, 2 specialized losses, and 3 adaptation protocols, yielding 1,440 training configurations and 7,200 robustness measurements across five perturbation types. To our knowledge, this is the most diverse and comprehensive benchmark of robust fine-tuning to date. While attention-based architectures and robust pretrained representations are increasingly popular, we find that convolutional neural networks pretrained in a supervised manner on large datasets often perform best. Our analysis both confirms and challenges prior design assumptions, highlighting promising research directions and offering practical guidance.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.14079

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Transportation (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Robust Fine-tuning of Zero-shot Models via Variance Reduction

Neural Information Processing SystemsMay-27-2025, 08:19:10 GMT

When fine-tuning zero-shot models like CLIP, our desideratum is for the fine-tuned model to excel in both in-distribution (ID) and out-of-distribution (OOD). Recently, ensemble-based models (ESM) have been shown to offer significant robustness improvement, while preserving high ID accuracy. However, our study finds that ESMs do not solve the ID-OOD trade-offs: they achieve peak performance for ID and OOD accuracy at different mixing coefficients. When optimized for OOD accuracy, the ensemble model exhibits a noticeable decline in ID accuracy, and vice versa. In contrast, we propose a sample-wise ensembling technique that can simultaneously attain the best ID and OOD accuracy without the trade-offs.

accuracy, ood accuracy, robust fine-tuning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)

Add feedback

Rethinking Weight Decay for Robust Fine-Tuning of Foundation Models

Neural Information Processing SystemsMay-26-2025, 19:18:47 GMT

Modern optimizers such as AdamW, equipped with momentum and adaptive learning rate, are designed to escape local minima and explore the vast parameter space. This exploration is beneficial for finding good loss basins when training from scratch. It is not necessarily ideal when resuming from a powerful foundation model because it can lead to large deviations from the pre-trained initialization and, consequently, worse robustness and generalization. At the same time, strong regularization on all parameters can lead to under-fitting. We hypothesize that selectively regularizing the parameter space is the key to fitting and retraining the pre-trained knowledge. This paper proposes a new weight decay technique, Selective Projection Decay (SPD), that selectively imposes a strong penalty on certain layers while allowing others to change freely.

artificial intelligence, machine learning, rethinking weight decay, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)

Add feedback

Directional Gradient Projection for Robust Fine-Tuning of Foundation Models

Huang, Chengyue, Tian, Junjiao, Maneechotesuwan, Brisa, Chopra, Shivang, Kira, Zsolt

arXiv.org Artificial IntelligenceFeb-21-2025

Robust fine-tuning aims to adapt large foundation models to downstream tasks while preserving their robustness to distribution shifts. Existing methods primarily focus on constraining and projecting current model towards the pre-trained initialization based on the magnitudes between fine-tuned and pre-trained weights, which often require extensive hyper-parameter tuning and can sometimes result in underfitting. In this work, we propose Directional Gradient Projection (DiGraP), a novel layer-wise trainable method that incorporates directional information from gradients to bridge regularization and multi-objective optimization. Besides demonstrating our method on image classification, as another contribution we generalize this area to the multi-modal evaluation settings for robust fine-tuning. Specifically, we first bridge the uni-modal and multi-modal gap by performing analysis on Image Classification reformulated Visual Question Answering (VQA) benchmarks and further categorize ten out-of-distribution (OOD) VQA datasets by distribution shift types and degree (i.e. near versus far OOD). Experimental results show that DiGraP consistently outperforms existing baselines across Image Classfication and VQA tasks with discriminative and generative backbones, improving both in-distribution (ID) generalization and OOD robustness.

dataset, digrap, fine-tuning, (15 more...)

arXiv.org Artificial Intelligence

2502.15895

Country: North America > United States (0.04)

Genre: Research Report > New Finding (0.48)

Technology: